373 research outputs found

    Optimistic Agents are Asymptotically Optimal

    Full text link
    We use optimism to introduce generic asymptotically optimal reinforcement learning agents. They achieve, with an arbitrary finite or compact class of environments, asymptotically optimal behavior. Furthermore, in the finite deterministic case we provide finite error bounds.Comment: 13 LaTeX page

    On the Computability of Solomonoff Induction and Knowledge-Seeking

    Full text link
    Solomonoff induction is held as a gold standard for learning, but it is known to be incomputable. We quantify its incomputability by placing various flavors of Solomonoff's prior M in the arithmetical hierarchy. We also derive computability bounds for knowledge-seeking agents, and give a limit-computable weakly asymptotically optimal reinforcement learning agent.Comment: ALT 201

    Extreme State Aggregation Beyond MDPs

    Full text link
    We consider a Reinforcement Learning setup where an agent interacts with an environment in observation-reward-action cycles without any (esp.\ MDP) assumptions on the environment. State aggregation and more generally feature reinforcement learning is concerned with mapping histories/raw-states to reduced/aggregated states. The idea behind both is that the resulting reduced process (approximately) forms a small stationary finite-state MDP, which can then be efficiently solved or learnt. We considerably generalize existing aggregation results by showing that even if the reduced process is not an MDP, the (q-)value functions and (optimal) policies of an associated MDP with same state-space size solve the original problem, as long as the solution can approximately be represented as a function of the reduced states. This implies an upper bound on the required state space size that holds uniformly for all RL problems. It may also explain why RL algorithms designed for MDPs sometimes perform well beyond MDPs.Comment: 28 LaTeX pages. 8 Theorem

    Following the Leader and Fast Rates in Linear Prediction: Curved Constraint Sets and Other Regularities

    Get PDF
    The follow the leader (FTL) algorithm, perhaps the simplest of all online learning algorithms, is known to perform well when the loss functions it is used on are positively curved. In this paper we ask whether there are other “lucky” settings when FTL achieves sublinear, “small” regret. In particular, we study the fundamental problem of linear prediction over a non-empty convex, compact domain. Amongst other results, we prove that the curvature of the boundary of the domain can act as if the losses were curved: In this case, we prove that as long as the mean of the loss vectors have positive lengths bounded away from zero, FTL enjoys a logarithmic growth rate of regret, while, e.g., for polyhedral domains and stochastic data it enjoys finite expected regret. Building on a previously known meta-algorithm, we also get an algorithm that simultaneously enjoys the worst-case guarantees and the bound available for FTL

    Following the Leader and Fast Rates in Linear Prediction: Curved Constraint Sets and Other Regularities

    Get PDF
    The follow the leader (FTL) algorithm, perhaps the simplest of all online learning algorithms, is known to perform well when the loss functions it is used on are positively curved. In this paper we ask whether there are other “lucky” settings when FTL achieves sublinear, “small” regret. In particular, we study the fundamental problem of linear prediction over a non-empty convex, compact domain. Amongst other results, we prove that the curvature of the boundary of the domain can act as if the losses were curved: In this case, we prove that as long as the mean of the loss vectors have positive lengths bounded away from zero, FTL enjoys a logarithmic growth rate of regret, while, e.g., for polyhedral domains and stochastic data it enjoys finite expected regret. Building on a previously known meta-algorithm, we also get an algorithm that simultaneously enjoys the worst-case guarantees and the bound available for FTL

    Universal knowledge-seeking agents for stochastic environments

    No full text
    We define an optimal Bayesian knowledge-seeking agent, KL-KSA, designed for countable hypothesis classes of stochastic environments and whose goal is to gather as much information about the unknown world as possible. Although this agent works for arbitrary countable classes and priors, we focus on the especially interesting case where all stochastic computable environments are considered and the prior is based on Solomonoff’s universal prior. Among other properties, we show that KL-KSA learns the true environment in the sense that it learns to predict the consequences of actions it does not take. We show that it does not consider noise to be information and avoids taking actions leading to inescapable traps. We also present a variety of toy experiments demonstrating that KL-KSA behaves according to expectation

    Bayesian reinforcement learning with exploration

    No full text
    We consider a general reinforcement learning problem and show that carefully combining the Bayesian optimal policy and an exploring policy leads to minimax sample-complexity bounds in a very general class of (history-based) environments. We also prove lower bounds and show that the new algorithm displays adaptive behaviour when the environment is easier than worst-case

    Irus and his jovial crew : representations of beggars in Vincent Bourne and other eighteenth-century writers of Latin verse

    Get PDF
    Alastair Fowler has written, with reference to the time of Milton, of ‘Latin's special role in a bilingual culture’, and this was still true in the early eighteenth century. The education of the elite placed great emphasis on the art of writing Latin verse and modern, as well as ancient, writers of Latin continued to be widely read. Collections of Latin verse, by individual writers such as Vincent Bourne (c. 1694–1747) or by groups such as Westminster schoolboys or bachelors of Christ Church, Oxford, could run into multiple editions, and included poems on a wide range of contemporary topics, as well as reworkings of classical themes. This paper examines a number of eighteenth-century Latin poems dealing with beggars, several of which are here translated for the first time. Particular attention is paid to the way in which the Latin poems recycled well-worn tropes about beggary which were often at variance with the experience of real-life beggars, and to how the specificities of Latin verse might heighten negative representations of beggars in a genre which, as a manifestation of elite culture, appealed to the very class which was politically and legally responsible for controlling them

    Sequential Extensions of Causal and Evidential Decision Theory

    Full text link
    Moving beyond the dualistic view in AI where agent and environment are separated incurs new challenges for decision making, as calculation of expected utility is no longer straightforward. The non-dualistic decision theory literature is split between causal decision theory and evidential decision theory. We extend these decision algorithms to the sequential setting where the agent alternates between taking actions and observing their consequences. We find that evidential decision theory has two natural extensions while causal decision theory only has one.Comment: ADT 201
    • 

    corecore